Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new LIME and KernelSHAP explainers #1077

Merged
merged 87 commits into from
Jun 18, 2021

Conversation

memoryz
Copy link
Contributor

@memoryz memoryz commented Jun 9, 2021

In this PR, we rewrote the LIME explainers and added KernelSHAP explainers in the com.microsoft.ml.spark.explainers package.

New features:

  • KernelSHAP explainer for tabular, vector, image and text models.
  • LIME explainer now supports kernel width and sample weights.
  • Both explainer support categorical variable (in tabular explainer).
  • Both explainers report r-squared metric from the underlying regression model.
  • Both explainers support explaining multiple classes output in one run.
  • For tabular and vector models, both explainers support passing in a background dataframe. If one is not given, the dataframe used for local interpretation will be used as background data.

Sample notebooks will be included in the next PR.

@memoryz memoryz requested a review from mhamilton723 June 9, 2021 17:21
@memoryz
Copy link
Contributor Author

memoryz commented Jun 9, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 9, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 9, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 9, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Jun 9, 2021

Codecov Report

Merging #1077 (1364d30) into master (00bac62) will increase coverage by 1.15%.
The diff coverage is 93.86%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1077      +/-   ##
==========================================
+ Coverage   84.34%   85.50%   +1.15%     
==========================================
  Files         208      232      +24     
  Lines        9789    10484     +695     
  Branches      565      601      +36     
==========================================
+ Hits         8257     8964     +707     
+ Misses       1532     1520      -12     
Impacted Files Coverage Δ
...a/com/microsoft/ml/spark/core/utils/RowUtils.scala 0.00% <0.00%> (ø)
...a/com/microsoft/ml/spark/explainers/RowUtils.scala 11.11% <11.11%> (ø)
...om/microsoft/ml/spark/explainers/BreezeUtils.scala 50.00% <50.00%> (ø)
...ala/org/apache/spark/ml/param/DataFrameParam.scala 70.83% <57.14%> (+14.31%) ⬆️
...microsoft/ml/spark/explainers/LocalExplainer.scala 76.92% <76.92%> (ø)
...m/microsoft/ml/spark/explainers/FeatureStats.scala 87.50% <87.50%> (ø)
...om/microsoft/ml/spark/explainers/TabularSHAP.scala 91.66% <91.66%> (ø)
.../com/microsoft/ml/spark/explainers/ImageSHAP.scala 93.10% <93.10%> (ø)
.../com/microsoft/ml/spark/explainers/ImageLIME.scala 93.33% <93.33%> (ø)
...com/microsoft/ml/spark/explainers/VectorLIME.scala 93.33% <93.33%> (ø)
... and 67 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00bac62...1364d30. Read the comment docs.

protected lazy val pyParamsDefinitions: String = {
this.params.map { p =>
val typeConverterString = getParamInfo(p).pyTypeConverter.map(", typeConverter=" + _).getOrElse("")
s"""|${p.name} = Param(Params._dummy(), "${p.name}", "${p.doc}"$typeConverterString)
s"""|${p.name} = Param(Params._dummy(), "${p.name}", "${escape(p.doc)}"$typeConverterString)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY!

counter += 1
}
unusedColumnName
val stream = Iterator(prefix) ++ Iterator.from(1, 1).map(prefix + "_" + _)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

@memoryz
Copy link
Contributor Author

memoryz commented Jun 9, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 10, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 10, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 10, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 10, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 10, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 10, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 10, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 17, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@memoryz
Copy link
Contributor Author

memoryz commented Jun 17, 2021

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -6,10 +6,13 @@ package com.microsoft.ml.spark.core.utils
import org.apache.spark.sql.Row
import org.apache.spark.sql.catalyst.expressions.GenericRow

// This class currently has no usage. Should we just remove it?
@deprecated("This is a copy of Row.merge function from Spark, which was marked deprecated.", "1.0.0-rc3")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can remove

@memoryz memoryz merged commit 7dd6bb1 into microsoft:master Jun 18, 2021
@memoryz memoryz deleted the jasowang/lime branch June 18, 2021 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants